Scalable Distributed Change Detection from Astronomy Data Streams Using Local, Asynchronous Eigen Monitoring Algorithms
نویسندگان
چکیده
This paper considers the problem of change detection using local distributed eigen monitoring algorithms for next generation of astronomy petascale data pipelines such as the Large Synoptic Survey Telescopes (LSST). This telescope will take repeat images of the night sky every 20 seconds, thereby generating 30 terabytes of calibrated imagery every night that will need to be coanalyzed with other astronomical data stored at different locations around the world. Change point detection and event classification in such data sets may provide useful insights to unique astronomical phenomenon displaying astrophysically significant variations: quasars, supernovae, variable stars, and potentially hazardous asteroids. However, performing such data mining tasks is a challenging problem for such high-throughput distributed data streams. In this paper we propose a highly scalable and distributed asynchronous algorithm for monitoring the principal components (PC) of such dynamic data streams. We demonstrate the algorithm on a large set of distributed astronomical data to accomplish well-known astronomy tasks such as measuring variations in the fundamental plane of galaxy parameters. The proposed algorithm is provably correct (i.e. converges to the correct PCs without centralizing any data) and can seamlessly handle changes to the data or the network. Real experiments performed on Sloan Digital Sky Survey (SDSS) catalogue data show the effectiveness of the algorithm.
منابع مشابه
Scalable, asynchronous, distributed eigen monitoring of astronomy data streams
Kanishka Bhaduri, Kamalika Das, Kirk Borne, Chris Giannella, Tushar Mahule, Hillol Kargupta Mission Critical Technologies Inc., NASA Ames Research Center, MS 269-1, Moffett Field, CA-94035 Email:[email protected] Stinger Ghaffarian Technologies Inc., NASA Ames Research Center, MS 269-3, Moffett Field, CA-94035 Email:[email protected] Computational and Data Sciences Dept., GMU, VA-...
متن کاملScalable Sum-shrinkage Schemes for Distributed Monitoring Large-scale Data Streams
In this article, we investigate the problem of monitoring independent large-scale data streams where an undesired event may occur at some unknown time and affect only a few unknown data streams. Motivated by parallel and distributed computing, we propose to develop scalable global monitoring schemes by parallel running local detection procedures and by using the sum of the shrinkage transformat...
متن کاملA Scalable Local Algorithm for Distributed Multivariate Regression
This paper offers a local distributed algorithm for multivariate regression in large peer-to-peer environments. The algorithm can be used for distributed inferencing, data compaction, data modeling and classification tasks in many emerging peer-to-peer applications for bioinformatics, astronomy, social networking, sensor networks and web mining. Computing a global regression model from data ava...
متن کاملDALD:-Distributed-Asynchronous-Local-Decontamination Algorithm in Arbitrary Graphs
Network environments always can be invaded by intruder agents. In networks where nodes are performing some computations, intruder agents might contaminate some nodes. Therefore, problem of decontaminating a network infected by intruder agents is one of the major problems in these networks. In this paper, we present a distributed asynchronous local algorithm for decontaminating a network. In mos...
متن کاملScalable Robust Monitoring of Large - Scale Data Streams
Online monitoring large-scale data streams has many important applications such as industrial quality control, signal detection, biosurveillance, but unfortunately it is highly non-trivial to develop scalable schemes that are able to tackle two issues of robustness concerns: (1) the unknown sparse number or subset of affected data streams and (2) the uncertainty of model specification for high-...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009